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Abstract 

Many insects harbor inherited bacterial endosymbionts. Although some of them are not strictly essential and are considered facul- 
tative, they can be a key to host survival under specific environmental conditions, such as parasitoid attacks, climate changes, or 
insecticide pressures. Thewhitefly Bemisia tabaci is at the top of the list of organisms inflicting agricultural damage and outbreaks, and 
changes in its distribution may be associated to global warming. In this work, we have sequenced and analyzed the genome of 
Cardinium cBtQ1 , a facultative bacterial endosymbiont of B. tabaci and propose that it belongs to a new taxonomic family, which also 
includes Candidatus Amoebophilus asiaticus and Cardinium cEperl, endosymbionts of amoeba and wasps, respectively. 
Reconstruction of their last common ancestors' gene contents revealed an initial massive gene loss from the free-living ancestor. 
This was followed in Cardinium by smaller losses, associated with settlement in arthropods. Some of these losses, affecting cofactor 
and amino acid biosynthetic encoding genes, took place in Cardinium cBtQ1 after its divergence from the Cardinium cEperl lineage 
and were related to its settlement in the whitef ly and its endosymbionts. Furthermore, the Cardinium cBtQ1 genome displays a large 
proportion of transposable elements, which have recently inactivated genes and produced chromosomal rearrangements. The 
genome also contains a chromosomal duplication and a multicopy plasmid, which harbors several genes putatively associated 
with gliding motility, as well as two other genes encoding proteins with potential insecticidal activity. As gene amplification is very 
rare in endosymbionts, an important function of these genes cannot be ruled out. 

Key words: Amoebophilaceae, IS elements, gliding motility, Candidatus Cardinium hertigii, host-symbiont interaction. 



Introduction 

The sweet potato whitefly Bemisia tabaci (Aleyrodidae) is an 
important polyphagous agricultural pest (Byrne and Bellows 
1991). It feeds on plant phloem, causing many economic 
losses, affecting the plants directly and/or indirectly by trans- 
mitting several viruses. Bemisia tabaci was originally described 



as a complex species comprised by many biotypes (Brown 
et al. 1995), but recently it has been suggested that it is ac- 
tually a complex of at least 24 morphologically indistinguish- 
able species (De Barro et al. 201 1), although classification in 
biotypes still persists. Among the most important biotypes are 
B (Middle East-Asia Minor 1 species), which arose in the 
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Middle East but now has a cosmopolitan distribution, and 
Q (Mediterranean species), which originated in the 
Mediterranean basin and is also widely distributed at present. 
A phylogenetic analysis of mitochondrial biotype Q individuals 
worldwide has revealed the existence of three subclades 
(Q1-Q3) (Gueguen et al. 2010). 

Similar to other phloem-feeding insects, which need essen- 
tial amino acids and other nutrients due to their unbalanced 
diet, whiteflies have established mutualistic relationships 
with bacterial endosymbionts that provide essential com- 
pounds (Baumann 2005; Moran et al. 2008; Moya et al. 

2008) . All whiteflies species harbor the obligate, or primary 
endosymbiont, " Candidates Portiera aleyrodidarum" 
(Gammaproteobacteria; hereafter Portiera) (Thao and 
Baumann 2004), which is restricted to specialized insect cells 
called bacteriocytes. In B. tabaci, several other bacterial species 
(secondary or facultative symbionts) may coexist with Portiera 
(Gottlieb et al. 2008). The most frequently observed are 
Arsenophonus sp. and "Candidates Hamiltonella defensa" 
(hereafter Hamiltonella) (both Gammaproteobacteria), which 
share bacteriocytes with Portiera. Less frequently present are 
Rickettsia sp v Wolbachia sp. (both Alphaproteobacteria) 
and " Candidates Cardinium sp." (Bacteroidetes), detected in 
bacteriocytes but also in other insect tissues. 

Symbiosis between whiteflies and Portiera has been shaped 
through a long-term relationship, mainly characterized by 
Portiera providing essential nutrients (amino acids and carot- 
enoids) to the host, thus compensating for its deficient diet, as 
revealed by genome sequencing (Santos-Garcia et al. 2012; 
Sloan and Moran 2012). However, the effects of facultative 
endosymbionts need to be determined. So far, it has been 
noted that secondary symbionts in other insects can have dif- 
ferent effects, such as reproductive manipulation, nutritional 
contribution, temperature tolerance, and defence against 
pathogens and parasitoids (Feldhaar 201 1 ; White et al. 201 1 ). 

"Ca. Cardinium hertigii" (hereafter C. hertigii) was first 
characterized in Encarsia wasps, which are parasitoids of B. 
tabaci, and it was proposed as the species type (Zchori-Fein 
etal. 2004). However, in recent years, infections with bacteria 
belonging to the genus Cardinium have been detected not 
only in whiteflies but also in other insects (armored scale, 
sharpshooters, and Culicoides spp.) and other arthropods 
(mites, ticks, spiders, and copepods). Nowadays, the infection 
rate in arthropods has been estimated as close to 7% (Zchori- 
Fein and Perlman 2004; Gruwell et al. 2009; Nakamura et al. 

2009) . Based on molecular data (16S rRNA and gyrB genes) 
and the presence of microtubule-like complexes, a morpho- 
logical feature shared by all known Cardinium, the genus 
has been divided into supergroups and strains, following a 
nomenclature similar to Wolbachia endosymbionts, with 
four described supergroups (A, B, C, and D) (Lo et al. 2002; 
Nakamura etal. 2009; Edlund etal. 2012). Similarly, in several 
arthropod taxa, C. hertigii has been described as a reproduc- 
tive manipulator through diverse effects such as feminization, 



cytoplasmic incompatibility, and induction of parthenogenesis 
(White et al. 2011). However, these effects have not been 
found in other species (e.g., B. tabaci), suggesting that 
C. hertigii might also be a mutualistic endosymbiont. This 
aforesaid claim has been supported by the recently released 
genome of Cardinium cEperl (endosymbiont of the wasp 
Encarsia pergandiella), which encodes a complete biotin bio- 
synthetic pathway, suggesting a potential role in wasp nutri- 
tion (Penz et al. 2012). The 16S rRNA gene sequences from 
several Cardinium endosymbionts of B. tabaci reported to date 
are very similar (>99%) to those of the species type C. hertigii, 
suggesting that they are strains of the same species. Several 
analyses of secondary endosymbiont coinfections revealed 
that B. tabaci individuals infected only with C. hertigii are 
very unusual, especially for the C1 strain (supergroup A), the 
most widespread Cardinium among whiteflies (Gueguen et al. 
2010). Generally, the coexistence with Hamiltonella is the 
most frequently observed, although combinations with 
other secondary symbionts, such as Wolbachia sp. or 
Rickettsia sp., are also possible (Gueguen et al. 2010; Skaljac 
etal. 2010; Park etal. 2012). 

The laboratory strain B. tabaci QHC-VLC, belonging to the 
Q1 subclade, harbors Cardinium cBtQ1 , which belongs to the 
C1 strain according to its 16S rRNA gene. This strain coexists 
within bacteriocytes harboring Portiera and Hamiltonella and 
can also be found scattered in different tissues of the whitefly 
(Gottlieb et al. 2008), similar to other hosts infected by 
C hertigii (Bigliardi et al. 2006; Kitajima et al. 2007; 
Nakamura et al. 2009). In contrast, C. hertigii strains from 
different Encarsia species have only been detected in the ova- 
ries and accessory cells (Zchori-Fein et al. 2004; Penz et al. 
2012). 

In this work, the genome of Cardinium cBtQ1 endosymbi- 
ont of B. tabaci QHC-VLC was sequenced and compared 
with that of the cEperl strain and with several other 
Bacteroidetes, including the parasitic amoeba endosymbiont 
"Candidates Amoebophilus asiaticus" (hereafter referred to 
as A. asiaticus) (Horn et al. 2001; Schmitz-Esser et al. 2010). 
Overall, all the analysis indicated that the Cardinium cBtQ1 
genome has undergone changes to facilitate its recent estab- 
lishment in B. tabaci. 

Materials and Methods 

Genome Assembly and Annotation 

Enriched bacterial samples (Harrison et al. 1989) were col- 
lected from approximately 40,000 B. tabaci strain QHC-VLC 
adult whiteflies. DNA was extracted with the JetFlex Genomic 
DNA purification kit following the manufacturer's instructions 
(Genomed). DNA was pyrosequenced using Roche 454 GS 
FLX Titanium single-end (shotgun) and paired-end (3 kb) librar- 
ies, and an lllumina HiSeq2000 mate-pair library (5kb). The 
genome assembly was complex due to the high number of 
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repetitive sequences, around 14% of the genome. This 
includes the presence of long duplications, almost 1 00% iden- 
tical, and a large number of IS (Insertion Sequence elements). 
Duplicated regions were detected by abnormal coverage 
peaks, the differential genomic context (different presence 
of IS elements, genes, nonrepetitive intergenic regions), and 
paired-ends/mate-paired connections. The ends of most con- 
tigs were incomplete IS elements of different types. Because 
the contigs ending in the same IS type usually did not overlap, 
we expected a slight underestimation of the percentage of 
repetitive elements of Cardinium cBtQ1 genome. See 
supplementary materials and methods, Supplementary 
Material online (supplementary file S1, Supplementary 
Material online) for a detailed description of the assembling 
and annotation procedure, as well as the software used. 

TBLASTX was used to compare the gene content between 
plasmids of Cardinium cBtQ1 and cEperl , and BLAST hits and 
gene order were plotted with the genoPlotR package (Guy 
et al. 2010) from R software (R Development Core Team 
2011). 

Phylogenetic and Phylogenomic Analyses 

High-quality sequenced 16S rRNAs (including fragments of 
more than 900 bp) were downloaded from SILVA (Quast 
et al. 2013) and National Center for Biotechnology 
Information (NCBI) databases, ssu-aligner was employed for 
the alignment, because it takes into account the secondary 
structure (based on covariance models) of the 1 6S rRNA genes 
(Nawrocki 2009). Predefined masking was selected to ensure 
reproducibility in future alignments, and finally, the alignment 
was refined with Gblocks (Castresana 2000) allowing 50% in 
coverage gaps. General time reversible, with estimates of in- 
variant sites and gamma-distribution among-site rate variation 
(GTR+I+G), was selected as the best model using jModeltest2 
(Darriba et al. 2012). Additionally, amino acid sequences for 
the gyrB gene were downloaded from the NCBI database, 
aligned with MAFFT using the L-INS-i algorithm (Katoh et al. 
2002) and refined with Gblocks. ProtTest3 (Darriba et al. 
2011) gave the improved general amino-acid replacement 
matrix, with gamma distributed rates across sites (LG+G) as 
the best evolutionary model. All accession numbers are sup- 
plied in the supplementary table S1, Supplementary Material 
online. 

For phylogenomic reconstructions, 37 orthologous single 
copy genes related to the translation/transduction machinery 
and protein folding functions that were present in 61 
Bacteroidetes genomes (with the exception of Candidatus 
Sulcia mulleri CARI that lacked 9 of these genes) and a non- 
Bacteroidetes species, used as outgroup, were selected using 
the homology search tool from the Microbial Genome 
Database (Uchiyama 2003) (supplementary table S1, 
Supplementary Material online). Encoding proteins were 
downloaded and aligned with MAFFT v6.717b (L-INS-i) and 



concatenated. The selected best model was LG+G+F based 
on ProtTest3 results. 

The first 15 BLASTP hits using the encoded proteins of 
Cardinium cBtQ1 cgl gene {CHV_c0068) and the Leishmania 
major genes cgl (cystathionine gamma-lyase, LmjF35.3230), 
cbl (cystathionine beta-lyase, LmjF32.2640), and cgs 
(cystathionine gamma-synthase, LmjF1 4.0460) were selected, 
and their amino acid sequences downloaded. In addition, the 
first 1 5 BLASTP hits against Bacteroidetes for the cystathionine 
beta-lyase protein {metQ from Escherichia coli str. K-12 
(NP_417481 .1) were also selected. Amino acid sequences 
were aligned with MAFFT v6.717b (L-INS-i), and ProtTest3 
was used to select the appropriate model for the alignment, 
in this case, the LG+G model. For plasmid TraG protein, the 
first 50 BLASTP best hits were downloaded (including 
Cardinium cEperl and A. asiaticus) and aligned with MAFFT 
v6.717b (L-INS-i). For TraG, the best model selected with 
ProtTest3 was LG+G+I. 

RaxML (Stamatakis 2006) was used to calculate maximum 
likelihood phylogenetic trees for all the alignments, using op- 
timizations for branch lengths and model parameters, and 
1,000 rapid bootstrap replicates. Models were adjusted for 
each case. In addition, PhyloBayes3 (Lartillot et al. 2009) was 
used to infer Bayesian posterior distributions for each phylo- 
genetic tree. In each case, evolutionary model was adjusted to 
the model selected (described above), and three independent 
chains were run for each alignment. Following Lartillot et al 
(2009) recommendations, all effective sizes were greater than 
200 and maximum discrepancy between chains was less than 
0.1 . Finally, a majority rule consensus tree was calculated for 
each alignment. 

Genomic Redundancy and Mobile Elements 

NUCmer from MUMmer 3 (Kurtz et al. 2004) was used to plot 
repetitive regions of A. asiaticus (AmAs), Cardinium cBtQ1, 
and Cardinium cEperl , using each genome as query and sub- 
ject. Results were filtered and only sequences with at least 
95% identity and 500 bp length were used. To estimate the 
level of redundancy in the genomes, a BLASTN approach 
using each genome (chromosome plus plasmid concatenated) 
as query and subject was performed with e-value cutoff of 
1e _2 ° The BLASTN results were transferred to a spreadsheet, 
where any alignment with an identity smaller than 95% was 
removed. Lines were sorted through several steps to identify 
the single copy segments of the genome. The repetitive frac- 
tion was estimated subtracting the summation of the single 
copy segments from the total genome length. IS elements 
were detected as described previously (Gil et al. 2008), refined 
using the web server ISsaga (Varani et al. 201 1) and deposited 
in ISfinder database. Reference copies for each mobile ele- 
ment were used to search with BLASTX against the nonredun- 
dant NCBI database (1e~ 3 e-value cutoff). The 25 best hits for 
each mobile element were used as the input for MEGAN4 
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(Huson et al. 201 1), and taxonomical assignments were done 
with the default LCA parameters. The genomes of A asiaticus, 
Cardinium cEperl , and Cardinium cBtQ1 (with contigs conca- 
tenated in decreasing order, excluding the plasmid) were used 
to compute nucleotide synteny blocks with progressive Mauve 
aligner (Darling et al. 2010). Cardinium cBtQ1 was set as the 
reference for gene order and the alignment was plotted with 
the genoPlotR package. 

Orthologous Gene Identification 

After phylogenomic reconstruction, all genomes belonging to 
the order Cytophagales, including both Cardinium strains 
(cBtQ1 and cEperl) and A. asiaticus (AmAs), were used for 
orthologous gene identification. Flavobacterium johnsoniae 
(Bacteroidetes: Flavobacteriales), whose gliding motility has 
been broadly studied, was selected as the outgroup for the 
order Cytophagales. All proteins, including those from plas- 
mids, were used as the input for OrthoMCL (Li et al. 2003). 
OrthoMCL and COG (Clusters of Orthologous Groups of pro- 
teins) (Tatusov et al. 2000) profile assignment pipelines were 
run as described previously (1.5 as inflation value, 70% of 
match cutoff, and an e-value cutoff of 1e~ 5 ) (Manzano- 
Mann et al. 2012). Gene clusters may contain zero, one, 
two, or more genes in each genome. Some gene clusters 
were manually refined because OrthoMCL failed to recognize 
some orthologous genes in endosymbionts. 

Orthologous genes for A asiaticus and both Cardinium 
strains were classified as core genome (genes shared by 
the three genomes), genes shared by two of the three organ- 
isms and strain-specific genes (supplementary table S2, 
Supplementary Material online). Euler diagrams were ob- 
tained with gplots package (Warnes et al. 2013) from R. 

Last Common Ancestor Reconstruction 

All genomes used in the OrthoMCL clustering method were 
used to reconstruct the putative last common ancestor (LCA) 
gene contents for each node in the phylogeny (fig. 1 and 
supplementary table S3, Supplementary Material online). 
The MPR (most parsimonious reconstruction) function in ape 
package (Paradis et al. 2004) from R was used to infer the 
ancestral state for each character (gene clusters) in each node 
(LCA). Pseudogenes for all genomes were manually revised. 
For orthologous cluster assignments of the pseudogenes, 
TBLASTX was used (e value of 1e~ 5 and an overlap of 80% 
query subject) against the proteins present in the orthologous 
clusters. Pseudogenes that did not modify the LCA reconstruc- 
tion (strain-specific genes) were not considered. Pseudogenes 
that were mobile elements were also excluded. Parsimony 
reconstruction for orthologous groups that included the pre- 
viously selected pseudogenes were checked using parsimony 
reconstruction of discrete characters in Mesquite (Maddison 
WPand Maddison DR 2011). 



For each reconstructed LCA and genome, COG categories 
were assigned for each orthologous cluster based on the initial 
OrthoMCL results and the COG assignment described above 
(Manzano-Mann et al. 2012). For each orthologous cluster, 
COG categories with less than a 10% of a cluster, as well as 
the unassigned category, were removed. The LCA, ^determi- 
nations (the presence/absence of the gene in the LCA node 
could not be determined) were counted as half (0.5), instead 
of presence (1) or absence (0). Relative percentages of each 
COG type using LCA4 or LCA2 as reference were plotted 
using the gplots heatmap2 function without dendograms 
and reordering functions. Euler diagram was plotted using 
gplots. COG profiles, stated as the absolute number of COG 
categories divided by the total number of COG for each 
genome or LCA, were plotted as a heatmap with gplots al- 
lowing hierarchical clustering (dendograms are grouping the 
most similar rows or columns together). Cyclobacteraceae 
family habitats were obtained from GOLD database (supple- 
mentary table S3, Supplementary Material online). 

Analyses of B. tabaci and Encarsia spp. Samples 

See supplementary materials and methods and tables S4-S6, 
Supplementary Material online (supplementary file S1, 
Supplementary Material online). 

Fluorescent In Situ Hybridization 

See supplementary materials and methods, Supplementary 
Material online (supplementary file S1, Supplementary 
Material online). 

Results 

General Features of Cardinium cBtQ1 Genome 

Cardinium cBtQ1 has a relatively small genome (1 .065 Mb) 
composed of a chromosome (1.013 Mb) and a circular plas- 
mid (52 kb) named pcBtQI (table 1). The chromosomal se- 
quence is composed of 11 contigs (612, 80.4, 77.6, 73.8, 
66.9, 41.4, 30.3, 13.5, 7.4, 5.1, and 4.1 kb, respectively) 
with average coverages of 90 x and 547 x for 454 and 
lllumina platforms, respectively, whereas the plasmid is in a 
single contig with coverages of 595x (454) and 4,046x 
(lllumina). Paired-end and mate-pair information confirmed 
that the plasmid was a single closed circular contig. Also, 
the higher coverage of the plasmid compared with the chro- 
mosomal contigs is indicative of a multicopy plasmid, probably 
between six and seven copies relative to the chromosome. 

We found 709 and 30 coding genes on the chromosome 
and plasmid, respectively. Many of them were annotated as 
encoding conserved or hypothetical proteins. We also anno- 
tated 1 56 pseudogenes in the chromosome, 1 32 derived from 
transposases and 24 from nontransposase genes (supplemen- 
tary table S7, Supplementary Material online), and 4 in the 
plasmid (3 transposases and 1 resolvase). The genome 
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Salinibacter ruber DSM 13855 
Salinibacter ruber M8 



87 



Rhodothermus marinus SGO. 5JP17-172 
Rhodothermus marinus DSM 4252 
j — — Haliscomenobacter hydrossis | Saprospiraceae 



Rhodothermaceae 



Order color code 

— Flavobacteriales 

— Bacteroidales 

— Cytophagales 

— Sphingobacteriales 



94 



Flavobacteriaceae 



— Chitinophaga pinensis 
_l Capnocytophaga ochracea 

— Capnocytophaga canimorsus 
i — Zunongwangia profunda 

-P — Gramella forsetii 

— Croceiba cter atlanticus 
Krokinobactersp. 4H-3-7-5 
Lacinuthx sp. 5H-3-7-4 
Muricauda ruestringensis 

— Robiginitalea bi formats 
Zobellia galactanivorans 

- Mahbacter sp. HTCC2170 
n — Cellulophaga algicola 
* — Cellulophaga lytica 

— Flavobactehum columnare 

Flavobactehum johnsoniae 
-P — Flavobactehum branchiophilum 
I — Flavobactehum psychrophilum 

Blattabactehum sp. Periplaneta americana 
Blattabactehum sp. Mastotermes darwiniensis 
Blattabactehum sp. Blattella germanica 
Candidates Sulcia muelleri CARI 



Blattabacteriaceae 



Flavobacteriaceae 



85 



j — Riemerella anatipestiferATCC 11845 
"I Flavobactehaceae bactehum 

- Weeksella virosa 

- Fluviicola taffensis | Cryomorphaceae 

- Odohbactersplanchnicus I POTphyromonadacea e 

- Paludibacter propiomcigenes I 
Candidates Azobacteroides pseudotrichonymphae 



Parabacteroides distasonis I Porphyromonadaceae 

q Prevotella ruminicola 
- Prevotella denticola 
- Prevotella melaninogenica 
rBacteroides vulgatus 



Prevotellaceae 



Bacteroides salanitronis 
r Bacteroides thetaiotaomicron 
-L Bacteroides fragilis NCTC9343 



Bacteroidaceae 



54 



Bacteroides fragilis YCH46 
Bacteroides helcogenes 
Porphyromonas gingivalis ATCC 33277 
Porphyromonas gingivalis W83 
Porphyromonas gingivalis TDC60 

Porphyromonas asaccharolydca 



Porphyromonadaceae 



67 



Sphingobacteriaceae 



I— 7 

791 5 



Pedobacter hepahnus 
Sphingobactehumsp. 21 
Pedobacter saltans 

Candidates Amoebophilus asiaticus 

- Candidates Cardinium hertigii cEperl 

1 - Candidatus Cardinium hertigii cBtQl 

— Cyclobactehum mahnum \ Cyclobacteriaceae 
■ Mahvirga tracteosa \ Flammeovirgaceae 

— Leadbetterella byssophila 

— Spirosoma linguale 

— Runella slithyformis 

— Dyadobacter fermentans 
Cytophaga hutchinsonii 



Amoebophilaceae 



0.1 



Chlorobaculumtepidum 



Cytophagaceae 



Fig. 1. — Phylogenomic maximum likelihood tree of 61 Bacteroidetes. Phylogenomic reconstruction was done under the LG+G+F model on a con- 
catenated alignment of 37 proteins. Cardinium genomes fall in the Cytophagales clade, with Mahvirga tractuosa and Cy. mahnum as the closest free-living 
relatives. Cardinium cBtQl is displayed in bold. Family names are displayed on the right delimited by a horizontal red line. The genomes used for the LCA 
reconstruction are shown in blue. Numbers inside gray dots show the LCAs reconstructed in each node. Only maximum likelihood bootstrap values below 
95% are displayed. Bayesian posterior probabilities for each node were above 0.95 and are also not displayed. Chlorobaculum tepidum was used as 
outgroup. 
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Table 1 



General Genomic Features of Cardinium Strains and Amoebophilus asiaticus 


Bacterial Genome 


Cardinium cBtQ1 a 




Cardinium cEper1 b 


A. asiaticus 5a2 


Host 


Bemisia tabaci 




Encarsia pergandiella 


Acanthamoeba spp. 




Chromosome Plasmid 


Chromosome Plasmid 


Chromosome 


Contigs 


11 


1 


1 1 


1 


Size (kb) 


1,013 


52 


887 58 


1,884 


GC (%) 


35 


32 


36 31 


35 


CDS 


709 


30 


841 65 


1,557 


Average CDS length (bp) 


1,033 


1,389 


911 733 


990 


Coding density (%) 


79.7 


80.1 


85.5 82.1 


81.8 


rRNAs 


3 




3 — 


3 


tRNAs 


35 




37 — 


35 


Other RNA genes 


2 








Pseudogenes (total) 


156 


4 


3 — 


222 


Pseudogenes (transposase) 


132 


3 


3 — 




Pseudogene (other CDS) 


24 


1 






Reference 


This study 




Penz et al. (2012) 


Schmitz-Esser et al. (2010) 



a The chromosome of Cardinium cBtQ1 is not closed, but it is considered a high-quality draft genome. 
^The chromosome of Cardinium cEperl contains a single gap not closed due to repetitive elements. 



contains one set of rRNA genes distributed in two segments, 
one including the 1 6S rRNA and the other the 23S and the 5S 
rRNA genes. It also contains a set of 35 tRNA genes, which are 
able to completely decode the mRNA sequences, and two 
other noncoding RNA genes (rnpB and tmRNA) (table 1). 

Taxonomic Status of Cardinium cBtQ1 Endosymbiont of 
B. tabaci Biotype Q 

We performed a phylogenomic reconstruction of a 
Bacteroidetes phylogeny (see supplementary table S1, 
Supplementary Material online, for locus tags in each 
genome), which showed that both Cardinium and A. asiaticus 
formed a well-differentiated clade, related to the families 
Cyclobacteriaceae and Flammeovirgaceae, with the family 
Cytophagaceae slightly more distant (fig. 1). This reconstruc- 
tion was used to select the genomes for subsequent analyses. 

Because this phylogenomic reconstruction is consistent 
with other reported studies (Gupta and Lorenzini 2007; 
Karlsson et al. 2011), and due to the high bootstrap values 
obtained, we propose that the Cardinium/Amoebophilus 
clade should be assigned to the order Cytophagales, instead 
of remaining in the nonclassified Bacteroidetes. Furthermore, 
the analysis suggests that within the Cytophagales, 
Cardinium forms a new family phylogenetically allied with 
the Cyclobacteriaceae and Flammeovirgaceae, to which we 
propose the name Amoebophilaceae (fig. 1). 

To establish the relationship of Cardinium cBtQ1 to other 
Cardinium endosymbionts based on 16S rRNA sequences, a 
covariance model aligner was employed. The 1 6S rRNA align- 
ment was used to infer a phylogeny, showing that almost all 
Cardinium endosymbionts of B. tabaci (including cBtQ1) were 
present in a clade with other Cardinium endosymbionts of 



several Encarsia species (99.14% identity with the whole 
16SrRNA of Cardinium from E. pergandiella) (fig. 2, left). 
A phylogeny with gyrB was also performed, which corrobo- 
rated the close phylogenetic relation with Cardinium from 
Encarsia spp. and also showed that Cardinium cBtQ1 was em- 
bedded in the Cardinium-Encarsia clade (fig. 2, right). Because 
1 6S rRNA sequences of Cardinium cBtQ1 and the species type 
C. hertigii (symbiont of E. hispida) show only 1 .2% of differ- 
ences, we propose that Cardinium cBtQ1 is a strain of the 
latter, in agreement with previous reports (Zchori-Fein and 
Perlman 2004). The Bemisia/Encarsia clade belongs to the 
Cardinium group A, which is well differentiated from the 
other two groups included in the analysis: Group C, which is 
specific to the genus Culicoides (Nakamura et al. 2009) and 
group D, which is present in some Copepoda spp. (Edlund 
etal. 2012) (fig. 2). 

Genome Comparison of Cardinium Strains and 
A. asiaticus 

Compared with A. asiaticus (Schmitz-Esser et al. 2010), the 
genomes of Cardinium cEperl (Penz et al. 2012) and 
Cardinium cBtQ1 (this work) contain a smaller number of 
coding genes (table 1). However, Cardinium cEperl has 
almost no annotated pseudogenes, a difference that might 
reflect gene annotation criteria because open reading frames 
that belong to transposase fragments are annotated as coding 
sequences (CDSs). The average gene identity between 
Cardinium cEperl and cBtQ1 was 92.9%, whereas the aver- 
age protein identity was 91.8%. The genome fraction as- 
signed to coding genes (labeled as coding density in table 1) 
was approximately 6% smaller in Cardinium cBtQ1 than in 
Cardinium cEperl . 



1 01 8 Genome Biol. Evol. 6(4): 101 3-1 030. doi:10.1093/gbe/evu077 Advance Access publication April 10, 2014 



Cardinium cBtQ1 , an Endosymbiont of Bemisia tabaci 



GBE 



r 



£ 03 

< _c 

a. ^ 

o I 

CJ) t 

5 
.3 



o 

<Z 

o 



^ I 

oo c. 

Q. .03 

3 U 



E ~2 
O £ 



S -2 

cu Q- 

+- ' 13 

a; id " 

QJ CD 



< 



15 || 

J: o as 
+ 



03 c cc: > 
_ cu i — > 
O sq 



cd O 



5 .| 
■2 ■§ 

~5 1/1 



- % - 

— O U 
fC ~Q C 
'oo CD <D 
< O =5 

| 6 S" 

_C <^ oo 



CD O 

-Q _a 

oo 03 



T3 ^ Ol (5 rc 




CD 



- 5 

o ]d 

E Q_ 

n3 g- 

on .CD 

0- ^ 



■8 ' 

o ■ 



ro C CD 
-C= . _Q 
CD C= 



Q_ 

C O "D 

f 

-p, 03 Qj 



CZ 03 _ CD 

■S 3 X I £ 

+-> T3 CD ■ — 

CD CD CD Q_ . r 



CD ^ 

o -=2 



a 



03 CD 



2^ O 
03 LO 



E Q. 



-Q 



03 QQ CD 



CD CD 



03 _Q 
^ E 



S £ -Q- CD 



>, Q- O 



? a 
o 



< = 

>< E 



I +-• CD U CD 



_Q 

no 2 



a. 

Q_ 



E 
o 



O CD 

o & 

CD TO 



O 00 00 CD 
^- CD ;- i= 



■-P O 

S E 

CD CD 
CD _C 
O 

a. as 
.2 ^> 

cu o 
>> t 

03 n3 



U 

§- ; 
o 

5 ■ 

00 ■ 
O 



I TO .24 

fN t ■ = 



CD 

£ + 

O — 1 

CD 



Q_ 

CD CD 03 
03 to 

o b 8 

Q_ "o 00 



.^0.^3 

M— OO C 



C -r-, M= "D 

§-< 1 £ 

CD t CD C 



Genome Biol. Evol. 6(4): 101 3-1 030. doi:10.1093/gbe/evu077 Advance Access publication April 10, 2014 



1019 



Santos-Garcia et al. 



GBE 




Fig. 3. — Comparison of Cardinium plasmids. TBLASTX comparison of the plasmids from Cardinium cBtQ1 (pcBtQI ) and cEperl (pCher). Gray arrows are 
genes included in the syntenic block, blue arrows nontransposase genes, red arrows transposase genes, and the green arrow is a resolvase pseudogene. Red 
lines show genes in the same orientation and blue lines in reverse orientation. Some gene names are shown in the plot. 



Cardinium cEperl also contains a plasmid (pCher) of similar 
size to pcBtQI . However, the gene content of both plasmids is 
different, with only a few shared genes in a syntenic segment 
showing a high level of nucleotide identity (fig. 3). For exam- 
ple, CHV jd004 {virE, virulence-associated E family protein), 
CHV _p006 (pre, plasmid recombination enzyme), and 
CHV _p011 (traG, putative conjugal transfer protein TraG) all 
show nucleotide identities that range from 76% to 83% be- 
tween both plasmids. Phylogenetic reconstructions for each of 
these genes support the close relation between both plasmids 
(see TraG phylogeny in supplementary file S1 : supplementary 
fig. S1, Supplementary Material online). Nucleotide, gene 
order conservation, and phylogeny suggest that both plasmids 
derive from a common ancestral plasmid present before the 
split of both Cardinium lineages. A putative traG gene 
(Aasi_0886), closely related to Cardinium traG phylogeneti- 
cally (supplementary fig. S1, Supplementary Material online), 
is harbored in the A. asiaticus chromosome, suggesting that 
the origin of the plasmid can be traced to the family 
Amoebophilaceae, with its subsequent chromosomal inser- 
tion in A. asiaticus. 

All three genomes share a core of 468 gene clusters (sup- 
plementary table S2, Supplementary Material online, and 
fig. 4), of which six encode putative host interacting proteins 
(Penz et al. 2012). There are 140 unique gene clusters that 
were present in both Cardinium but not in A. asiaticus, 46 of 
which encode hypothetical proteins, 1 5 are membrane trans- 
port related proteins, and 13 are putative host interacting 
proteins. Among the remaining Cardinium shared genes, 
there are six coding for transposases, two for phage-derived 
proteins (or afp-like proteins), and five for vitamin biosynthetic 
proteins (supplementary table S2, Supplementary Material 
online). 

Cardinium cEperl has 202 strain-specific gene clusters, 
which include, among others, those encoding hypothetical 
proteins (145), transposases (30), host-interacting proteins 
(6), and biotin (2) and pyridoxal (1) biosynthetic enzymes. 
Cardinium cEperl and A. asiaticus share 13 gene clusters 



cEpe rl 




Fig. 4. — Euler diagram of orthologous clusters. Euler diagram repre- 
senting the core genome, the strain-specific orthologous clusters, and the 
orthologous clusters shared by only two organisms. Numbers inside each 
subspace represent the number of orthologous gene clusters assigned to 
the subspace. Core genome set is displayed in orange. cBtQ1, Cardinium 
cBtQ1; cEperl, Cardinium cEperl; AmAs, Amoebophilus asiaticus. 



with most of them defined as hypothetical proteins (6), 
mobile elements (3), a cell-wall-related protein, a membrane 
protein, and a host-manipulation protein (supplementary 
table S2, Supplementary Material online). 

Cardinium cBtQ1 contains 71 gene clusters, 65 strain- 
specific, and 6 shared with A. asiaticus, which are not present 
in Cardinium cEperl. These gene clusters include ankyrin- 
domain-containing proteins (14), hypothetical proteins (35), 
transposases, and other mobile elements (4). A set of very 
interesting genes is located in the multicopy plasmid of 
Cardinium cBtQ1. They include four gliding genes (gldK, 
gidL, gidM, and gldN, see fig. 3) that are related to motility 
in members of the phylum Bacteroidetes (shared with A. asia- 
ticus). The fact that the chromosome of Cardinium cBtQ1 
contains four duplicated genes rtxBDE {A. asiaticus contained 
a single copy of the paralogous genes rtxBE) and tolC related 
to type I secretion system (T1SS) is also remarkable because 
only a few sequenced Bacteroidetes harbor secretion system 
types I, III, IV, or VI (McBride and Zhu 201 3). The rtxBDE genes, 
related to the hemolysin secretion proteins (Hly), seem to be 
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Fig. 5. — Redundancy and synteny between Cardinium and Amoebophilus asiaticus genomes. (A) Putative linear representation of the ancestral genomic 
region before duplication (on top) and the present state of the two duplications, which are distributed in five contigs (on bottom). Note that Ipxti, mreB, tolQ 
rtxBDE, and yitW conserve two intact copies, whereas the other duplicated genes only maintain one intact copy. Red arrows are mobile elements, blue 
arrows genes in the duplicated region, green arrows pseudogenized genes, and gray arrows adjacent genes outside the duplication. Orange bars connect the 
two duplicated copies of each gene. Contig names are plotted at the beginning or the end of the contig (CHVJ, and only regions that contain the 
duplications are shown. The right ends of contigs CHV_g and CHV_e are connected through paired-end information with the right ends of either contig 
CHVJ or CHVJ. In both cases, a complete ISCcal copy, whose fragments are detected at the end of the contigs, is required for joining. (B) Mummer plot 
showing direct (red) and inverted (blue) genomic repeats with at least 500 bp lengths and 95% similarity. For A asiaticus (AmAs) and Cardinium cEperl 
(cEperl), inner plot lines denote the division of the chromosome in base pairs sections. Black arrows point contig ends for the largest contigs in Cardinium 
cBtQ1. These contigs were placed in order of decreasing length. Because plots are not scaled to genome size due to limitations of the software, it is 
noteworthy that the A asiaticus genome is less repetitive than Cardinium cBtQ1 although the more compact plot in the former may alter that impression. (0 
Common pairwise syntenic blocks of more than 1 kb for A asiaticus (AmAs), Cardinium cBtQ1 (cBtQ1 ), and cEperl (cEperl ). The chromosome of cBtQ1 was 
taken as reference. Contigs in cBtQ1 are ordered in order of decreasing length and denoted by double backslashes. For plotting reasons, only the seven 
longest cBtQ1 contigs are shown. Red and blue lines show blocks in direct and inverted orientation. The stronger the line, the more nucleotide identity 
between synteny blocks. 



an event of horizontal gene transfer (HGT), with RTX toxin 
transport system of Vibrio as best BLAST hits. The chromo- 
somal segment involving these genes is duplicated in 
Cardinium cBtQ1 (fig. 5A). 



Other interesting Cardinium cBtQ1 -specific genes, also lo- 
cated in the plasmid, are the putative toxin-related genes 
CHV jd018 and CHV _p021. The latter encodes a long (4,603 
amino acids) RHS-repeat-associated core-domain protein with 
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C-terminus ankyrin repeats. Large proteins with RHS domains 
have been related with bacterial insecticidal toxins and inter- 
cellular signaling proteins (TIGR03696). Although no clear 
signal peptide was detected in CHV _p021, the presence of 
ankyrins in the C-terminus domain in combination with a 
signal peptide has been attributed to protein secretion by 
T1SS (Kaur et al. 2012). Because the best BLASTX hits, with 
63% query coverage and 36% identity (on average), belong 
to Daphnia, Wolbachia (HGT event), and mosquitoes, it sug- 
gests that the target of this protein can be a conserved protein 
in arthropods. 

The level of redundancy in the genome of Cardinium cBtQ1 
(-14%) was twice as high as the level found in Cardinium 
cEperl and A. asiaticus (-7% in both cases), most of which 
is associated with mobile elements (fig. 5B). The mobile ele- 
ments of Cardinium cBtQ1, and their inactive derivatives, ac- 
count for approximately 1 66 kb of the chromosome (196 
copies) and 1 2.5 kb of the plasmid (1 2 copies) (supplementary 
table S8, Supplementary Material online). From this number of 
mobile element copies, only 48 contained a functional trans- 
posase gene, whereas 135 were transposase pseudogenes. 
These transposase proteins were classified in 20 different IS 
families, with only eight being complete IS elements (contain- 
ing intact transposase genes and inverted repeats at their 
ends) and could be named according to the ISfinder recom- 
mendations and deposited under the names ISCcal -8 (sup- 
plementary table S8, Supplementary Material online). Only 
three mobile element types were specific of the Cardinium 
cBtQ1 (ISCca6, nv_IS3, and the Retron type one), whereas 
the rest of transposases were shared with A. asiaticus, 
Cardinium cEperl or both. 

It would appear that at least some IS in Cardinium are still 
active, in contrast with A. asiaticus (Schmitz-Esser et al. 201 1 ), 
given we can observe very recent gene duplication events 
(based on > 99.9% nucleotide identity), with one of the 
copies interrupted by the insertion of an IS (e.g., pseudogenes 
recG or ftsK) (fig. 5A). Another important feature is the pres- 
ence of a repetitive element composed of a copy of ISCca4 and 
a copy of nv_IS2, resulting in an IS that is apparently active. The 
inactivation of ftsK was produced by the insertion of this chi- 
meric IS. Recombination associated to IS may be the cause of 
the high number of rearrangements in the Cardinium cBtQ1, 
with only some microsyntenic blocks maintained (fig. 5Q. 

Finally, Cardinium cBtQ1 contains a recent chromosome 
segmental duplication (almost 100% identical) involving at 
least 11 genes and around 1 7 kb, distributed in five contigs 
(fig. 5A). The duplicated region contains the genes cpsA 
(a protease), recG (recombination and DNA repair), glyS 
(aminoacyl-tRNA-ligase), Ipxti (lipid A biosynthesis), mreB 
(actin-like bacterial cytoskeleton), to/C(T1SS transmembrane 
transporter), rtxB, rtxD, and rtxE (T1SS ABC transporters HlyB 
and HlyD), fe/C(cell cycle and chromosome partitioning), and 
yitW (putative chromosome partitioning related function). 
Although one of the two copies of cpsA, recG, glyS, and 



ftsK is pseudogenized by IS insertions, the rest of the genes 
conserve the two functional copies (fig. 5A), indicating that 
their retention could be advantageous for the organism fit- 
ness. However, the fact that the two copies are still active due 
to their recent duplication cannot be ruled out. 

Evolution of Gene Repertoires in Lineages of 
Amoebophilus and Cardinium 

Gene clusters obtained with OrthoMCL were classified 
according to COG categories and used to reconstruct by max- 
imum parsimony the gene cluster content in the nine LCA 
corresponding to each node denoted with gray circles 
(fig. 1). Several analyses were performed to compare the 
gene content of the present and reconstructed ancestral 
genomes. 

First, hierarchical clustering based on the relative abun- 
dance (percentage) of each COG category in each genome 
was performed (fig. 6). Three main clusters were observed: 
One that contained the endosymbiotic genomes and the 
LCA1 and 2; a second that grouped Marivirgia tractuosa 
and Cyclobacterium marinum with the LCA3, 4, 8, and 9; 
and a third that contained the rest of the genomes and 
LCAs. The second cluster (fig. 6, blue) showed a clear reduc- 
tion in some COG groups as G and K but an enrichment in the 
H and J groups when it was compared with the third cluster 
(fig. 6, yellow). Also, hierarchical clustering grouped both 
Cardinium strains with A. asiaticus and LCA1 and LCA2. 
They showed a stronger conservation of genes in some 
COG categories such as J and O, when they were com- 
pared with the free-living Bacteroidetes, a signal also ob- 
served in other symbiotic reduced genomes. LCA4, the 
ancestor of the Cardinium/A. asiaticus lineage and 
family Cyclobacteriaceae, was close to the free-living 
Cyclobacteriaceae (see habitats in supplementary table S3, 
Supplementary Material online), suggesting a similar free- 
living style. Parsimony reconstruction assigned 1,301 gene 
clusters to LCA4 and the equally parsimonious presence/ 
absence of other 684 gene clusters. 

Second, the transition from LCA4 to LCA2 was examined 
comparing the percentages of gene clusters in each COG cat- 
egory regarding LCA4 (1 00%) (fig. 7 A). It had a strong impact 
in the number of gene clusters with more than half of them 
being lost (LCA2, 655 gene clusters plus 36 present/absent) 
(supplementary table S3, Supplementary Material online). 
Relatively to LCA4 (100%), there was a high reduction in all 
categories except for some housekeeping categories such as J, 
L, and D with more than 80% of gene clusters conserved. 
Biosynthetic capabilities of LCA2 were clearly reduced (e.g., 
COG categories C, E, F, and H). 

Third, the transition from LCA2 to LCA1 and to both 
Cardinium and A. asiaticus was examined comparing the per- 
centages of gene clusters in each COG category regarding 
LCA2 (100%) (fig. IB). The transition to LCA1 (649 gene 
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Fig. 6. — Relative abundance of gene clusters in Bacteroidetes and LCAs. Hierarchical clustering heatmap representing the relative abundance (per- 
centage) of each COG category in relation to the total number of gene clusters in each genome. Three main COG clusters (left) are observed: Highly retained 
categories (J, L, and R), medium retained categories (I, H, G, T, 0, E, P, C, S, K, and M), and low retained categories (V, F, Q, U, D, N, and Z). Three main 
species/LCA cluster (up) are cEperl, AmAs, cBtQ1, LCA1, and LCA2 (only symbionts, left cluster); MaTr, CyHu, LCA3, LCA4, LCA8, and LCA9 (middle 
cluster), and FUh, SpLi, DyFe, LeBy, CyMa, RuSI, LCA5, LCA6, and LCA7. Species clustering together by COG categories could have similar metabolic features 
and consequently, a similar ecological niche. Cardinium cEperl (cEperl), Ca. Amoebophilus (AmAs), Cardinium cBtQ1 (cBtQ1), Marivirga tractuosa (MaTr), 
Cytophaga hutchinsonii (CyHu), Flavobacterium johnsoniae (FUh), Spirosoma linguale (SpLi), Dyadobacter fermentans (DyFe), Leadbetterella byssophila (LeBy), 
Cydobacterium marinum (CyMa), and Runella slithyformis (RuSI). 



clusters) produced the loss of 1 60 gene clusters, although 1 1 8 
new gene clusters were acquired. Comparing the number of 
gene clusters of LCA2 to LCA1 , and to both Cardinium and A 
asiaticus, we observed several differences among COG cate- 
gories (fig. IB, supplementary table S3, Supplementary 
Material online). First, A asiaticus showed 331 strain-specific 
gene clusters, distributed in several categories, not present in 
LCA2. Second, the reductive evolution of the Cardinium line- 
age was more clearly observed in several COG categories, 
such as E, G, H, S, T, and V. The absence of gene clusters in 



Cardinium for the N category was probably due to the fact 
that some genes related with motility have not been yet an- 
notated in the COG database, especially those involved in 
gliding motility (see later) that are, in fact, present in 
Cardinium cBtQ1, A asiaticus, LCA1, LCA2, and LCA4. 

Biosynthetic Capabilities in Cardinium cBtQ1 

Cardinium cBtQ1, according to KEGG classification pathways, 
presents low biosynthetic capabilities, similar to those 
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Fig. 7. — Last free-living common ancestor comparison. (A) Heatmap showing the percentage of genes in each COG category, compared with the 
number of the same category in LC4 (1 00%). In left, reduced phylogenomic reconstruction with the name of each LCA reconstructed. (B) The same heatmap 
type comparing to LCA2 (100%). L category in Cardinium cEperl is an artifact produced by an incorrect annotation of inactivated transposases as CDS 
instead of pseudogenes. COG definitions are below. cBtQ1, Cardinium cBtQ1; cEperl, Cardinium cEperl; AmAs, Amoebophilus asiaticus. 



observed in Cardinium cEperl and A asiaticus (Karlsson et al. 
2011; Penz et al. 2012).The main differences between the 
biosynthetic capabilities of both Cardinium strains are in the 
biosynthesis of vitamins and cofactors. Both bacteria are able 
to produce lipoate, a key cofactor for intermediate metabo- 
lism and an important antioxidant molecule (Spalding and 
Prigge 2010). Cardinium cEperl has the genes pdxS and 
pdxT and can synthesize pyridoxal 5-phosphate (precursor of 
vitamin B6). However, pdxT was pseudogenized by an IS 
transposition in cBtQ1. This event seems to have happened 
recently, because the pdxT pseudogene is 93.5% identical to 
the cEperl gene, a percentage higher than the average gene 
identity between these two strains. 

In addition, it is noteworthy that in Cardinium cEperl, 
the presence of a complete biotin operon, a coenzyme be- 
longing to vitamin B class, is an event of HGT from 
Alphaproteobacteria to the genus Cardinium (Penz et al. 
2012). The loss of the ability to synthesize biotin in 
Cardinium cBtQ1 seems to have taken place by the combined 
effect of the insertion of an IS and a later deletion event, 
removing the complete bioB gene and almost the complete 



sequence of the adjacent bioF gene (92.5% identical to 
cEperl in the remnant segment). Another recent signal 
of the loss of a nutritional contribution is the pyridoxal- 
dependent enzyme cystathionine gamma-lyase (involved in 
the synthesis of cysteine) whose CDS contains an internal 
stop codon mutation that produces the pseudogene 
CHV_c0068 in cBtQ1 (94.9% identical to cEperl gene). 
A phylogenetic analysis showed that the functional gene, pre- 
sent in this state in cEperl, was acquired by an ancestor 
through HGT from a eukaryote, perhaps an amoeba (supple- 
mentary file S1: supplementary fig. S2, Supplementary 
Material online). The phylogenetic analysis, including the 
three in silico identified cystathionine metabolizing enzymes 
of L. major (Williams et al. 2009), showed its closer relation 
with L. major cystathionine gamma-lyase rather than with 
L. major cystathionine beta-lyase, as previously annotated in 
Cardinium cEperl (Penz et al. 2012). 

In Cardinium cBtQ1, the inability to synthesize pyridoxal 
and biotin may be complemented by other facultative endo- 
symbionts of the host. In the case of B. tabaci strain QHC-VLC, 
such enzyme synthesis could be achieved by Hamiltonella, 
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which is located inside the bacteriocytes (supplementary 
fig. S3, Supplementary Material online). The complete sets 
of genes required for the synthesis of both cofactors was de- 
scribed for the Hamiltonella secondary endosymbiont of the 
aphid Acyrthosiphon pisum (Degnan et al. 2009), and all of 
them were also found by BLAST similarity in the draft genome 
of Hamiltonella from B. tabaci strain QHC-VLC (unpublished 
data). 

Gliding Genes in Cardinium 

As stated earlier, the genome of Cardinium cBtQ1 harbors 
four gliding genes organized in an operon (gldK, gldL, 
gldM, and gldN) in the pcBtQI plasmid (fig. 3). To state 
that the differential tissue locations of Cardinium endosym- 
bionts of B. tabaci (scattered) and Encarsia spp. (restricted 
to ovaries) were correlated with the presence of the four 
gliding genes, two populations of E. pergandiella (USA and 
Brazil), one of E. hispida (Italy) and one from E. inaron 
(USA) were screened for their presence. None of these 
species gave a positive result for any of the gliding 
genes, whereas the presence of Cardinium was stated in 
all of them through the amplification of the control gene 
gyrB (supplementary file S1: supplementary materials and 
methods and tables S4 and S6, Supplementary Material 
online). 

ForR tabaci, adult whiteflies were sampled in 12 points 
from four localities of the province of Valencia (Spain) (sup- 
plementary file S1: supplementary materials and methods 
and tables S5 and S6, Supplementary Material online). 
Polymerase chain reaction (PCR) amplification revealed the 
presence of the four gliding genes (as well as of the large 
plasmid gene CHV _p021) in all the samples (except one 
that failed in all the PCR amplifications). To determine 
insect biotypes, a mitochondrial cytochrome oxidase I 
(COI) fragment was sequenced in four females per sam- 
pling point. Nine of the samples belonged to the biotype Q 
(Mediterranean species), whereas two were from the bio- 
type S (Sub-Saharan Africa species), an uncommon biotype 
in Spain (Moya et al. 2001) (supplementary file S1: supple- 
mentary fig. S4, Supplementary Material online). All the 
samples, including those of biotype S, harbored 
Cardinium, which was determined by PCR amplification 
of a fragment of the 16S rRNA gene. PCR products 
from biotype S (sample F) and biotype Q (sample B) 
(1,100 bp) were sequenced and resulted 100% identical 
to Cardinium cBtQ1. 

The presence of additional secondary endosymbionts was 
also checked finding that biotype Q individuals also contained 
Hamiltonella, whereas biotype S contained Arsenophonus and 
Wolbachia (supplementary file S1: supplementary materials 
and methods and tables S5 and S6, Supplementary Material 
online). 



Discussion 

Although the cost of maintaining an obligate mutualistic en- 
dosymbiont may be compensated in many cases by its sup- 
plementation of the host diet, the maintenance of stable 
associations with facultative symbionts may require either re- 
productive manipulation or compensatory benefits from the 
endosymbiont (Oliver et al. 2010; White et al. 2011; 
Wernegreen 2012). Both facultative and obligate endosymbi- 
onts are transmitted vertically, but facultative symbionts may 
retain the ability to be horizontally transmitted as revealed by 
the incongruence of host and symbiont phylogenetic recon- 
structions (Russell et al. 2003). 

Symbionts belonging to the genus Cardinium are present in 
many types of arthropods including arachnids (Nakamura 
et al. 2009), crustaceans (Edlund et al. 2012), and insects 
(Zchori-Fein et al. 2004; Zchori-Fein and Perlman 2004; 
Gruwell et al. 2009; Nakamura et al. 2009), indicating that 
members of this genus can colonize new niches. Phylogenies 
and nucleotide conservation of two genes (16S rRNA and 
gyrB) show that Cardinium from B. tabaci QHC-VLC was clo- 
sely related to Cardinium from armored scale insects and par- 
asitoid wasps, including the species type C. hertigii from the 
wasp E. hispida (Zchori-Fein et al. 2004). Researchers have also 
proposed that Cardinium cBtQ1 was a strain of C. hertigii, 
closely related to Cardinium cEperl (symbiont of E. pergan- 
diella) whose genome has recently been reported (Penz et al. 
2012). In this work, we have sequenced the genome of 
Cardinium cBtQ1, and all the analyses carried out confirm 
that this strain is closely related to Cardinium cEperl, despite 
being endosymbionts of whiteflies and parasitoids, respec- 
tively. There are, however, some differences that could be 
related to the massive presence of IS in Cardinium cBtQ1, as 
well as to the adaptation to the specific hosts of each strain 
(see later). 

The high number of available Bacteroidetes genomes pro- 
vided a robust phylogeny (fig. 1), which showed that both 
Cardinium (cEperl and cBtQ1) and A. asiaticus formed a 
well-defined clade, distant from other family members of 
Cyclobacteriaceae, Cytophagaceae, and Flammeovirgaceae 
in the order Cytophagales. Thus, we propose that they form 
a new family, to be named Amoebophilaceae. This name is 
proposed because A. asiaticus has a larger genome than 
Cardinium spp. and shares more genes with LCA2 than 
either Cardinium strain. Also on the basis of this phylogeny, 
we were able to infer the coding gene contents (clusters of 
genes) of the most recent common ancestor in the Cardinium/ 
Amoebophilus clade (LCA4, LCA2, and LCA1) and to analyze 
the evolution of the gene repertoires (figs. 6 and 7). 

Despite it is not the objective of this work, we can extract 
general ideas from the cluster analysis (fig. 6). The distribution 
of genes in COG categories is a well predictor of the way of 
life of the organisms. Hierarchical clustering indicates that 
LCA3 to 9 were, similar to free-living Bacteroidetes, able to 
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occupy different niches (fig. 6). For example, the differences 
between the abundance of G category in the middle and right 
clusters could be related to a more restricted source of carbo- 
hydrates (niche specialization). It also seems that the increase 
of the coenzyme metabolism (H) in the middle cluster could be 
advantageous for the establishment of symbiotic relationships 
(Cydobacterium was found in the celomic fluid of a sand 
dollar, supplementary table S3, Supplementary Material 
online). 

The number of gene clusters estimated in LCA4 was be- 
tween 1,301 and 1,985, showing a COG profile similar to 
those of M. tractuosa (Flammeovirgaceae) and Cy marinum 
(Cyclobacteriaceae) (fig. 6), and differing from the COG pro- 
files of LCA2 and LCA1, which clustered with those of the 
symbiotic bacteria A asiaticus and Cardinium (fig. 6). Because 
species of the family Cyclobacteriaceae are mostly marine 
free-living bacteria, and associations with animal hosts have 
been described (supplementary table S3, Supplementary 
Material online), we propose that LCA4 was probably a 
marine free-living bacterium, with a wide range of functional 
capabilities and the ability to establish symbiotic relationships. 
Also, it is likely that LCA4 was able to glide because it con- 
tained the whole set of gliding genes essential for gliding, 
including the sprATE genes (McBride and Zhu 2013). 

The transition from LCA4 to LCA2 was clearly a reductive 
process that affected almost all COG categories (fig. 7A) 
producing an ancestral endosymbiont with few biosynthetic 
capabilities. Considering that the species derived from LCA2 
were endosymbionts of amoebas (Horn et al. 2001 ; Schmitz- 
Esser et al. 2010) or insects (Zchori-Fein and Perlman 2004; 
Penz et al. 201 2), the most probable reason for this reduction 
was the transition from a free living to intracellular life style, to 
start a symbiotic (either mutualistic or parasitic) relationship 
with a eukaryotic host. During this transition, the number of 
gene clusters and associated functions was reduced, although 
LCA2 maintained the ability to acquire new genes by HGT. 
The higher number of gene clusters in A asiaticus versus LCA2 
(298) could be due to this fact, although other reasons, such 
as the possibility of a biased sample of genomes, or different 
annotation problems, would also explain its high number of 
specific gene clusters. 

Genome reduction was an ongoing process in the LCA1/ 
Cardinium clade, and it was notorious for some categories 
such as E and G (fig. IB). Moreover, the comparative analysis 
of the gene contents of the two Cardinium strains revealed 
that, despite being very similar at nucleotide level (92.9% nu- 
cleotide identity), revealing a recent evolutionary divergence, 
there are some relevant differences between the genomes 
of both strains, indicating differences in the evolution of en- 
dosymbiosis in Encarsia spp. and B. tabad. 

First, both Cardinium contain a plasmid of 50-60 kb with 
many differences in gene content (fig. 3). However, both plas- 
mids contain a short syntenic block of genes, whose nucleo- 
tide content and gene order conservation, as well as the gene 



phylogenies carried out, suggests that both derived from 
an ancestral plasmid present in their LCA (LCA1), and the 
differences must have been accumulated after the split of 
both lineages. These differences are due to the insertion of 
mobile elements, sometimes carrying accessory genes, and to 
the transfer of genes from the chromosome. 

Second, there are also differences in the chromosome of 
both Cardinium strains. The Cardinium cBtQ1 chromosome is 
1 26 kb larger (probably a bit larger because, as a draft 
genome, gaps are not taken into account for this calculation) 
than that of Cardinium cEperl ; nevertheless, this difference is 
not associated with a greater number of genes but to the 
presence of a large number of pseudogenes, most of them 
due to defective transposase encoding genes (table 1).The 
number of repeated sequences in the genome of Cardinium 
cBtQ1 is twice that of Cardinium cEperl, and most of these 
differences are due to a large number of IS in the former 
genome (fig. SB). Moreover, some IS types seem to have 
their origins in Alphaproteobacteria, probably related to the 
genera Rickettsia or Wolbachia, supporting the idea of HGT 
events between secondary endosymbionts harbored by the 
same host (Toft and Andersson 2010; Schmitz-Esser et al. 
201 1; Duron 2013). A large number of mobile elements is a 
typical feature of endosymbionts that have established a 
recent relationship with their hosts, such as Candidatus 
Sodalis pierantonius str. SOPE (Gil et al. 2008, Oakeson et 
al. 2014) or Sodalis glossinidius (Belda et al. 2010). Also, en- 
richment in mobile elements seems to be linked to genome 
plasticity in some facultative symbionts (Gillespie et al. 2012). 
Several lines of evidence indicate that IS elements (supplemen- 
tary table S8, Supplementary Material online) are already 
active in Cardinium cBtQ1. This activity combined with both 
the high number of copies throughout the genome, and a 
complete replication and repair machinery that can produce 
recombination, probably underlies the massive number of 
rearrangements in the genome of Cardinium cBtQ1, com- 
pared with Cardinium cEperl and A. asiaticus (fig. 5Q. 

Finally, among the genes with an annotated function and 
differential presence in the two Cardinium strains, we consider 
some may give clues about the relationship between 
Cardinium cBtQ1 and B. tabaci. First, the presence of pseudo- 
genes for pdx7~(pyridoxal biosynthesis) and cgl (cystathionine 
gamma-lyase) and the loss of two genes of the biotin operon 
indicate that these genes were present in LCA1 . This also in- 
dicates that the loss of these functions in Cardinium cBtQ1 is 
associated with settlement in a new environment composed 
by B. tabaci and its facultative symbiont Hamiltonella defensa. 
This symbiont seems to have become established in the pop- 
ulations of B. tabaci Q1 from Western Mediterranean based 
on the detection of frequencies of 1 00% (or almost 1 00%) in 
populations from North Africa and south-western Europe 
(Gueguen et al. 2010). Second, among 71 gene clusters of 
Cardinium cBtQ1, which are absent in Cardinium cEperl, 
there are up to 14 specific genes encoding ankyrin-domain 
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proteins, which can interact with the host's machinery, but 
further studies are needed to understand their functions. 
Finally, the most interesting genes are those whose expression 
has been amplified by either gene duplication (mreB, IpxH, 
yitW, and rtxBDE/tolQ or by their presence in a multicopy 
plasmid (e.g., CHV jp018, a putative toxin-related gene, 
CHV _p021 , a RHS-domain protein, and the gliding genes 
gldKLMN) (supplementary table S2, Supplementary Material 
online). 

On the basis of the ancestor reconstruction analysis, we can 
predict that the four gliding genes were present in LCA4, 
LCA2, and LCA1 and were lost in Cardinium cEperl. 
Although LCA4 conserved full gliding machinery, sprATE was 
lost in LCA2 possibly due to its accommodation to an intracel- 
lular environment. Because LCA1 conserved the gldKLMN 
operon, this suggests that gldKLMN was lost in the 
Cardinium cEperl lineage. Also, as in the closest 
Bacteroidetes genomes, such as A asiaticus, M. tractuosa, or 
Cy. marinum, these genes are located in the chromosome, and 
we can postulate that in Cardinium cBtQ1, they have been 
translocated to a multicopy plasmid conserving the operon 
order. This supports the importance of these genes for 
Cardinium cBtQ1 and suggests that they may explain why 
the strain is not confined to a single tissue in B. tabaci 
(Gottlieb et al. 2008) (supplementary fig. S3, Supplementary 
Material online), in opposition to Cardinium cEperl that is re- 
stricted to the ovaries of Encarsia (Zchori-Fein et al. 2004; Penz 
et al. 2012). Moreover, the gene amplification in Cardinium 
cBtQ1 not only of the four gliding genes but also of mreB and 
of the cluster rtxBDE/tolC suggests that they may play an im- 
portant role in this organism, as genome reduction is an on- 
going process in this strain. There are two possible hypotheses: 
1) those genes are involved in gliding as in other genomes; 2) 
they are involved in the novel type 9 secretion system (PorSS), 
which is also associated with the secretion of proteins involved 
in motility and toxins (Sato et al. 201 0; McBride and Zhu 201 3). 

Different Bacteroidetes possess the ability to move by a 
gliding mechanism, which is related to the ability to degrade 
some components present in the environment such as chitin 
and cellulose (Spormann 1999; McBride 2004; Braun et al. 
2005). Several examples of gliding have been reported in spe- 
cies of the class Cytophagia where C. hertigiiwas included (Xie 
et al. 2007; McBride and Zhu 201 3), and the scattered pattern 
detected for the tissue distribution of Cardinium cBtQ1 in 
B. tabaci might be caused by a similar mechanism. 

The motor model (or focal adhesion) proposed for gliding in 
myxobacteria (Spormann 1 999; Mignot et al. 2007; Jarrell and 
McBride 2008; Nakane et al. 201 3; Nan et al. 201 3) could be 
extended to Bacteroidetes, including Cardinium cBTQ1, al- 
though in a more simplified manner. This model considers 
molecular motors that are associated with cytoskeletal fila- 
ments and use proton motive force to transmit force through 
the cell wall to attached dynamic focal adhesion complexes 
(adhesins) to the substrate, causing the cell to move forward 



(Mignot et al. 2007; Sun et al. 201 1 ; Nan et al. 201 3). MreB, 
the homolog of the eukaryotic actin, has been proposed as 
the cytoskeletal part of the gliding machinery (Kearns 2007; 
Mauriello et al. 2010). Also, there may be an association with 
FtsZ, a protein that is part of the bacterium cytoskeleton and 
can produce force by itself (Erickson et al. 2010). Linkage be- 
tween the cytoskeleton and gliding is supported by experi- 
mental data with the use of compound A22, which is able 
to affect the MreB structure and inhibits the gliding motility in 
Myxobacteria (Nan et al. 2011). Eleven genes, detected in 
most Bacteroidetes, have been defined as the core of the 
gliding machinery. Four of these genes {gldB, gldD, gldH, 
and gldJ) have unknown function, whereas the remaining 
seven genes (gldK, gldL, gldM, gldN, sprA, sprE, and sprT) 
also encode the PorSS system (McBride and Zhu 2013). 

Cardinium cBtQ1 does not show the complete gliding ma- 
chinery as it only contains four gliding core genes (gldKLMN). 
Neither homologous nor potential analogous genes of 
gldBDHJ have been detected. The sprAET genes are also 
absent, but their function would potentially be substituted 
by the cluster rtxBDE/tolC, which is also duplicated. RTX secre- 
tion system belongs to the T1 SS and is able to transport pro- 
teins from the cytosol to the extracellular space in a SecYEG 
independent manner. Also, T1SS is able to secrete many dif- 
ferent RTX family proteins and proteins without the C-terminal 
RTX nonapeptide (Linhartova et al. 2010; Kaur et al. 2012). 
The RTX system would secrete the adhesins (or other proteins 
that could interact with the host) across the bacterial mem- 
brane. However, we did not detect any orthologs to known 
adhesin proteins, but proteins with eukaryotic domains, such 
as ankyrins, TPR or WH2 (found in this strain), may function as 
adhesins in a multicellular eukaryotic organism. Moreover, 
Cardinium may be able to manipulate the host cytoskeleton 
to form a scaffold, which could be used by the gliding ma- 
chinery (Haglund et al. 2010). Gliding seems to be a wide- 
spread direct invasion mechanism for different kinds of cells 
(Sibley et al. 1998; Furusawa et al. 2003; Sibley et al. 2004), 
thus Cardinium cBtQIand other Bacteroidetes might use this 
motor model system to invade new hosts or host tissues. In 
fact, Cardinium endosymbiont of Ixodes scapularis has been 
cultivated on insect cell lines and is capable of infecting new 
cells, even cell lines from different insect species, when they 
are added to the culture (Morimoto et al. 2006). 

The second hypothesis would consider that the gldKLMN 
operon is not involved in gliding, but it is just required for se- 
cretion forming the PorSS system, which was initially described 
for Porphyromonas gingivalis as a novel secretion system with 
eight proteins involved (PorK, PorL, PorM, PorN, PorT, PorW, 
Sov, and PorP) (Sato et al. 201 0). Putatively orthologous genes 
in the gliding system for the first seven are gldK, gldL, gldM, 
gldN, sprT, sprE, and sprA. The proposed orthologous gene for 
porP in F. johnsoniae was Fjoh_3477. A similar gene was not 
detected in either A. asiaticus or Cardinium. Proteins secreted 
by the PorSS systems are adhesins, as well as some enzymes 
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such as chitinases, and gingipains in F. johnsoniae and P. gin- 
givalis, respectively. Also, proteins secreted by the PorSS secre- 
tion system may contain a conserved C-terminal domain 
(TIGR4131 and 4183) (Sato et al. 2010; McBride and 
Zhu 2013). However, there were not proteins of Cardinium 
cBtQ1 with this domain. In the PorSS system, the presence 
of the protein complex GldKLMN is associated with the gen- 
eration of the energy required for protein secretion by SprTEA. 
However, these proteins are not encoded in the genome of 
Cardinium cBtQ1 (they are also absent in A. asiaticus), and their 
substitution by the T1 SS (RTX system) seems unlikely because 
T1SS has its own ATP-binding cassette, making the energy 
production function of the gldKLM unnecessary. This suggests 
that the PorSS system does not work in Cardinium cBtQ1. 
Thus, we hypothesized that the retention of the necessary 
genes for the gliding movement {gldKLMN) and further ampli- 
fication by the translocation to a multicopy plasmid, together 
with the acquisition of the RTX TISS is associated to the scat- 
tered phenotype of this strain and the ability to glide. 

The frequent presence of Cardinium cBtQ1 in B. tabaci bio- 
types Q and S may be due to its ability to benefit its host (Oliver 
et al. 2008; Feldhaar 201 1; Ferrari and Vavre 201 1), possibly 
related to its motility feature, and/or to the presence of some 
putative toxin-related genes such as CHV _p018 (low e-value 
BLAST hit against RTX toxins of H. defensa from A. pisum) and 
CHV _p021 , which could play a role in intercellular competi- 
tion, intercellular signaling, and insecticidal activity based on 
the presence of the RHS domain (Koskiniemi et al. 201 3). As a 
mobile endosymbiont, Cardinium cBtQ1 may contact a para- 
sitoid directly and secrete insecticidal toxins near it, could 
invade the parasitoid tissue, and kill it by secreting unknown 
toxins or by the cytotoxic effect of lipid A in a nonacclimated 
host (Furusawa et al. 2003; Caspi-Fluger et al. 2011; Rader 
et al. 201 2). Furthermore, other effects increasing host fitness 
cannot be excluded, like heat-stress resistance or maybe some 
advantages conferred by the lipoate supplementation (Moran 
et al. 2008; Moya et al. 2008). However, fitness may only be 
improved in some environments or climate conditions. 

In conclusion, we have reported the genome of Cardinium 
cBtQ1 endosymbiont of B. tabaci. Comparative genomics and 
ancestors' reconstruction of gene content have shed light on 
the drastic reduction in many functional categories that have 
taken place since its free-living ancestor up to the present. The 
loss of several cofactors and amino acid biosynthetic capabil- 
ities, retained in its close relative Cardinium cEperl, rules out 
an important role in host nutrition and suggests a relationship 
with its establishment in B. tabaci and its endosymbionts. The 
genome is still very dynamic, with many active transposable 
elements and rearrangements. On the basis of genomic data, 
we propose that Cardinium cBtQ1 has retained the minimal 
gliding core machinery present in its ancestors, which in com- 
bination with the acquired RTX secretion system might be 
used to move inside its host or invade new hosts. 



Supplementary Material 

Supplementary file S1 is available at Genome Biology and 
Evolution online (http://www.gbe.oxfordjournals.org/). 
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